SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "db:Swepub ;pers:(Lu Zhonghai);pers:(Zheng Lirong)"

Sökning: db:Swepub > Lu Zhonghai > Zheng Lirong

  • Resultat 1-10 av 11
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ma, Ning, et al. (författare)
  • A 101.4 GOPS/W Reconfigurable and Scalable Control-centric Embedded Processor for Domain-specific Applications
  • 2016
  • Ingår i: Proceedings - IEEE International Symposium on Circuits and Systems. - : IEEE. - 9781479953400 ; , s. 1746-1749
  • Konferensbidrag (refereegranskat)abstract
    • Increasing the energy efficiency and performance while providing the customizability and scalability is vital for embedded processors adapting to domain-specific applications such as Internet of Things. In this paper, we proposed a reconfigurable and scalable control-centric architecture, and implemented the design consisting of two cores and an on-chip multi-mode router in 65 nm technology. The reconfigurability is enabled by the restructurable sequence mapping table (SMT) thus the reorganizable functional units. Owing to the integration of the multi-mode router, on-chip or inter-chip network for multi-/many-core computing can be composed for performance extension on demand even in the post-fabrication stage. Control-centric design simplifies the control logic, shrinks the non-functional units and orchestrates the operations to increase the hard are utilization and reduce the excessive data movement for high energy efficiency. As a result, the processor can both conduct general-purpose processing with 29% smaller code size and application-specific processing with over 10 times performance improvement when implementing AES by SMT. The dual-core processor consumes 19.7 μW/MHz with die size of 3.5 mm2. The achieved energy efficiency is 101.4GOPS/W.
  •  
2.
  •  
3.
  • Ma, Ning, et al. (författare)
  • A Hierarchical Reconfigurable Micro-coded Multi-core Processor for IoT Applications
  • 2014
  • Ingår i: 2014 9TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC). - 9781479958108
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents a micro-coded multi-core processor featuring reconfigurability and scalability with high energy efficiency for IoT domain-specific applications. By simplifying the control logic and removing the pipelines, the gate count of one core is minimized to 14 K. Meanwhile, all the hardware units are directly controlled and can be reorganized by the long microinstructions. High utilization of the hardware is thus achieved when designing the micro programs properly. Furthermore, both the ISAs for C and Java have been implemented by the micro programs to supply the general-purpose programmability. Besides, application-specific instructions can be further developed once higher performance is demanded in specific scenarios. Depending on the performance requirement, the activity and working strategies of the cores are adjustable. Moreover, several processors can be further connected to construct a network with the integrated router for even higher performance. As a case study, the AES encryption is implemented using both C and micro programs. More than 10 times of performance improvement is achieved when using micro programs on the single core, and 20 times on two cores.
  •  
4.
  • Ma, Ning, et al. (författare)
  • Design and Implementation of Multi-mode Routers for Large-scale Inter-core Networks
  • 2016
  • Ingår i: Integration. - : Elsevier. - 0167-9260 .- 1872-7522. ; 53, s. 1-13
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • Constructing on-chip or inter-silicon (inter-die/inter-chip) networks to connect multiple processors extends the system capability and scalability. It is a key issue to implement a flexible router that can fit into various application scenarios. This paper proposes a multi-mode adaptable router that can support both circuit and wormhole switching with supplying flexible working strategies for specific traffic patterns in diverse applications. The limitation of mono-mode switched routers is shown at first, followed by algorithm exploration in the proposed router for choosing the proper working strategy in a specific network. We then present the performance improvement when applying the mixed circuit/wormhole switching mode to different applications, and analyze the image decoding as a case study. The multi-mode router has been implemented with different configurations in a 65 nm CMOS technology. The one with 8-bit flit width is demonstrated together with a multi-core processor to show the feasibility. Working at 350 MHz, the average power consumption of the whole system is 22 mW.
  •  
5.
  • Ma, Ning, et al. (författare)
  • Implementing MVC Decoding on Homogeneous NoCs : Circuit Switching or Wormhole Switching
  • 2015
  • Konferensbidrag (refereegranskat)abstract
    • To implement multiview video decoding on network on-chip (NoC) based homogeneous multicore architectures, the selection of switching techniques for routers is one of the most important aspects for design space exploration. Circuit switching and wormhole switching are two most feasible switching techniques for on-chip networks. To choose the suitable switching technique, we perform the comparison on decoding speed of the whole system, link utilization and delay between circuit switching and wormhole switching for implementing eight-view QVGA video decoding on 4 × 4 NoCs at 30 fps. The required link bandwidths are both around 800 Mbps with the similar network utilization and delay. We conclude that, to implement multiview video decoding on homogeneous NoCs, circuit switching is more suitable considering the similar performance and lower cost compared with wormhole switching.
  •  
6.
  • Ma, Ning, et al. (författare)
  • System design of full HD MVC decoding on mesh-based multicore NoCs
  • 2011
  • Ingår i: Microprocessors and microsystems. - : Elsevier BV. - 0141-9331 .- 1872-9436. ; 35:2, s. 217-229
  • Tidskriftsartikel (refereegranskat)abstract
    • Future multimedia applications such as full HD (1920 x 1080) multiview video coding (MVC) present great challenges on computing architectures. Even if with the state-of-the-art ASIC technology which can process single view HD decoding, dealing with multiple views would require times of computation capacity in proportion to the number of views, which is difficult to achieve. In this paper, we explore the system-level design space for full HD MVC applications mapped onto mesh-based multicore Network-on-Chip (NoC) architectures. To this end, we establish a simulation framework capable of simulating the combination of communication networks with computing cores. We investigate two task assignment schemes: picture-level assignment and view-level assignment. With an eight-view MVC decoding, we explore the design options with respect to network size, single-core performance and link bandwidth under both task assignment schemes. Our studies show that, to achieve a certain decoding performance, the computation capability and communication capacity should be balanced in the system. Also, to realize the eight-view HD decoding, the system only requires twice or less than twice of the single-core processing capacity required by single view decoding, thanks to the parallel computation and communication enabled by the multicore NoC architectures. Our results exhibit feasibility and potential of efficiently implementing the full HD MVC decoding on multicore NoC architectures.
  •  
7.
  • Ma, Ning (författare)
  • Ultra-low-power Design and Implementation of Application-specific Instruction-set Processors for Ubiquitous Sensing and Computing
  • 2015
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The feature size of transistors keeps shrinking with the development of technology, which enables ubiquitous sensing and computing. However, with the break down of Dennard scaling caused by the difficulties for further lowering supply voltage, the power density increases significantly. The consequence is that, for a given power budget, the energy efficiency must be improved for hardware resources to maximize the performance. Application-specific integrated circuits (ASICs) obtain high energy efficiency at the cost of low flexibility for various applications, while general-purpose processors (GPPs) gain generality at the expense of efficiency.To provide both high energy efficiency and flexibility, this dissertation explores the ultra-low-power design of application-specific instruction-set processors (ASIP) for ubiquitous sensing and computing. Two application scenarios, i.e. high-throughput compute-intensive processing for multimedia and low-throughput low-cost processing for Internet of Things (IoT) are implemented in the proposed ASIPs.Multimedia stream processing for human-computer interaction is always featured with high data throughput. To design processors for networked multimedia streams, customizing application-specific accelerators controlled by the embedded processor is exploited. By abstracting the common features from multiple coding algorithms, video decoding accelerators are implemented for networked multi-standard multimedia stream processing. Fabricated in 0.13 $\mu$m CMOS technology, the processor running at 216 MHz is capable of decoding real-time high-definition video streams with power consumption of 414 mW.When even higher throughput is required, such as in multi-view video coding applications, multiple customized processors will be connected with an on-chip network. Design problems are further studied for selecting the capability of single processors, the number of processors, the capacity of communication network, as well as the task assignment schemes.In the IoT scenario, low processing throughput but high energy efficiency and adaptability are demanded for a wide spectrum of devices. In this case, a tile processor including a multi-mode router and dual cores is proposed and implemented. The multi-mode router supports both circuit and wormhole switching to facilitate inter-silicon extension for providing on-demand performance. The control-centric dual-core architecture uses control words to directly manipulate all hardware resources. Such a mechanism avoids introducing complex control logics, and the hardware utilization is increased. Programmable control words enable reconfigurability of the processor for supporting general-purpose ISAs, application-specific instructions and dedicated implementations. The idea of reducing global data transfer also increases the energy efficiency. Finally, a single tile processor together with network of bare dies and network of packaged chips has been demonstrated as the result. The processor implemented in 65 nm low leakage CMOS technology and achieves the energy efficiency of 101.4 GOPS/W for each core.
  •  
8.
  • She, Huimin, et al. (författare)
  • A Network-based System Architecture for Remote Medical Applications
  • 2007
  • Ingår i: Proceedings of the Asia-Pacific Advanced Network Meeting.
  • Konferensbidrag (refereegranskat)abstract
    • Nowadays, the evolution of wireless communication and networktechnologies enables remote medical services to be availableeverywhere in the world. In this paper, a network-based systemarchitecture adopting wireless personal area network (WPAN)protocol IEEE 802.15.4/Zigbee standard and 3G communicationnetworks for remote medical applications is proposed. In theproposed system, the number and type of medical sensors arescalable depending on individual needs. This feature allows thesystem to be flexibly applied in several medical applications.Furthermore, a differentiated service using priority scheduling anddata compression is introduced. This scheme can not only reducetransmission delay for critical physiological signals and enhancebandwidth utilization at the same time, but also decrease powerconsumption of the hand-held personal server which uses batteryas the energy source.
  •  
9.
  • She, Huimin, et al. (författare)
  • Analysis of Traffic Splitting Mechanisms for 2D Mesh Sensor Networks
  • 2008
  • Ingår i: International Journal of Software Engineering and Its Applications. - 1738-9984. ; 2:3
  • Tidskriftsartikel (refereegranskat)abstract
    • For many applications of sensor networks, it is essential to ensure that messages aretransmitted to their destinations within delay bounds and the buffer size of each sensor nodeis as small as possible. In this paper, we firstly introduce the system model of a mesh sensornetwork. Based on this system model, the expressions for deriving the delay bound and bufferrequirement bound are presented using network calculus theory. In order to balance trafficload and improve resource utilization, three traffic splitting mechanisms are proposed. Andthe two bounds are derived in these traffic splitting mechanisms. To show how our methodapplies to real applications, we conduct a case study on a fresh food tracking application,which monitors the food freshness status in real-time during transportation. The numericalresults show that the delay bound and buffer requirement bound are reduced while applyingtraffic splitting mechanisms. Thus the performance of the whole sensor network is improvedwith less cost.
  •  
10.
  • She, Huimin, et al. (författare)
  • Deterministic Worst-case Performance Analysis for Wireless Sensor Networks
  • 2008
  • Ingår i: Proceedings of the International Wireless Communications and Mobile Computing Conference. - 9781424422029 ; , s. 1081-1086
  • Konferensbidrag (refereegranskat)abstract
    • Dimensioning wireless sensor networks requires formal methods to guarantee network performance and cost in any conditions. Based on network calculus, this paper presents a deterministic analysis method for evaluating the worst-case performance and buffer cost of sensor networks. To this end, we introduce three general traffic flow operators and derive their delay and buffer bounds. These operators are general because they can be used in combination to model any complex traffic flowing scenarios in sensor networks. Furthermore, our method integrates variable duty cycle to allow the sensor nodes to operate at lower rates thus saving power. Moreover, it incorporates traffic splitting mechanisms in order to balance network workload and nodes' buffers. To show how our method applies to real applications, we conduct a case study on a fresh food tracking application, which monitors the food freshness in realtime. The experimental results demonstrate that our method can be either used to perform network planning before deployment, or to conduct network reconfiguration after deployment.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 11

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy